A Comparison of Peak Callers Used for DNase-Seq Data

نویسندگان

  • Hashem Koohy
  • Thomas A. Down
  • Mikhail Spivakov
  • Tim Hubbard
چکیده

Genome-wide profiling of open chromatin regions using DNase I and high-throughput sequencing (DNase-seq) is an increasingly popular approach for finding and studying regulatory elements. A variety of algorithms have been developed to identify regions of open chromatin from raw sequence-tag data, which has motivated us to assess and compare their performance. In this study, four published, publicly available peak calling algorithms used for DNase-seq data analysis (F-seq, Hotspot, MACS and ZINBA) are assessed at a range of signal thresholds on two published DNase-seq datasets for three cell types. The results were benchmarked against an independent dataset of regulatory regions derived from ENCODE in vivo transcription factor binding data for each particular cell type. The level of overlap between peak regions reported by each algorithm and this ENCODE-derived reference set was used to assess sensitivity and specificity of the algorithms. Our study suggests that F-seq has a slightly higher sensitivity than the next best algorithms. Hotspot and the ChIP-seq oriented method, MACS, both perform competitively when used with their default parameters. However the generic peak finder ZINBA appears to be less sensitive than the other three. We also assess accuracy of each algorithm over a range of signal thresholds. In particular, we show that the accuracy of F-Seq can be considerably improved by using a threshold setting that is different from the default value.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corrigendum: Comparative evaluation of DNase-seq footprint identification strategies

DNase I is an enzyme preferentially cleaving DNA in highly accessible regions. Recently, Next-Generation Sequencing has been applied to DNase I assays (DNase-seq) to obtain genome-wide maps of these accessible chromatin regions. With high-depth sequencing, DNase I cleavage sites can be identified with base-pair resolution, revealing the presence of protected regions ("footprints"), correspondin...

متن کامل

ChIP-SEQ AND PEAK CALLING

et al details each of these steps and discusses how peak finding tools approach the separate steps very differently (3). A follow up review by Wilbanks et al evaluated the performance of 11 ChIP-seq peak callers nearly all of which are still widely used today (4). Each step can have parameters that can be adjusted by the user, but changing these can significantly affect the final peak lists. Ca...

متن کامل

Computational analysis of CLIP-seq data.

CLIP-seq experiments are currently the most important means for determining the binding sites of RNA binding proteins on a genome-wide level. The computational analysis can be divided into three steps. In the first pre-processing stage, raw reads have to be trimmed and mapped to the genome. This step has to be specifically adapted for each CLIP-seq protocol. The next step is peak calling, which...

متن کامل

GTRD: a database of transcription factor binding sites identified by ChIP-seq experiments

GTRD-Gene Transcription Regulation Database (http://gtrd.biouml.org)-is a database of transcription factor binding sites (TFBSs) identified by ChIP-seq experiments for human and mouse. Raw ChIP-seq data were obtained from ENCODE and SRA and uniformly processed: (i) reads were aligned using Bowtie2; (ii) ChIP-seq peaks were called using peak callers MACS, SISSRs, GEM and PICS; (iii) peaks for th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2014